On the prebiotic selection of nucleotide anomers: A computational study

Present-day known predominance of the β- over the α-anomers in nucleosides and nucleotides emerges from a thermodynamic analysis of their assembly from their components, i.e. bases, sugars, and a phosphate group. Furthermore, the incorporation of uracil into RNA and thymine into DNA rather than the other way around is also predicted from the calculations. An interplay of kinetics and thermodynamics must have driven evolutionary selection of life's building blocks. In this work, based on quantum chemical calculations, we focus on the latter control as a tool for “natural selection”.


Introduction
On what basis did early prebiotic conditions favor the selection and assembly of particular building blocks of nucleic acids? An aspect of this question is the subject of the present investigation. The broader as- nucleotides in the -configuration at the C1 ′ carbon of the furanose sugar and seldom in the -configuration ( Fig. 1), but why? The question is amplified by the ease by which anomers can be synthesized and by their similar ability to form Watson-Crick double helices (Fig. 1).
Another explored question is the factors that might have driven the selection of thymine for incorporation into DNA and for uracil into RNA. After all, the switching of the bases and sugars seems to be equally likely. Nucleosides, as predominantly existing in today's genetic material, are 2 ′ -deoxythymidine (dT) in DNA and uridine (U) in RNA rather than as T and dU. Why?
Thermodynamic and kinetic control drive chemical reactions. In this paper, the role of thermodynamics as a driver of evolutionary selection is being explored. The idea of thermodynamics-driven natural selection has been applied, for example, to explain the origin of the genetic code by Grosjean and Westhof (Grosjean and Westhof, 2016) and by Klump, Völker, and Breslauer (Klump et al., 2020). Here, this line of thought is extended to inquire whether the present-day forms of the building blocks of nucleic acids are, at least, aligned with such an energetic argument.
Clearly a natural thermodynamic selection of the -over theconfiguration or of the "correct" choice of U/T for the proper nucleic acid category, represent restricted questions from a vast repertoire of possible ones. For example, one may wish to inquire into the evolutionary pressures that picked the present-day particular arrangement of nucleic acids in terms of a pentose sugar, a phosphate group, and a nitrogenous base -rather than -say -having a 2-aminoethyl glycine as a linker, as occurs in peptide nucleic acids (PNAs) (Pellestor and Paulasova, 2004), instead of the phosphodiester backbone (Banack et al., 2012)? No one knows. One can question Nature's Central Dogma (DNA↔DNA→RNA→protein) (Crick, 1970) and whether this is the only conceivable way to produce living systems, etc. Clearly these wider questions are of utmost importance to understand the origins of life but are vastly larger than the scope of this investigation.
In 1955, Kaplan et al. (Kaplan et al., 1955) reported the isolation of a compound that had the same composition as nicotinamide adenine dinucleotide (NAD + ), the latter was termed diphosphopyridine nucleotide at the time. Subtle deviations in the properties of this compound from those anticipated for NAD + led these authors to conclude that it is an isomer of NAD + . Indeed, the compound discovered in 1955 is the -isomer of the NAD + (which has a -configuration at the glycosidic bond) (Kaplan et al., 1955). A decade later, Suzuki and co-workers (Suzuki et al., 1965) isolate bacterial Azotobacter vinelandii -NAD, -nicotinic acid adenine dinucleotide, -NADP, and -nicotinic acid mononucleotide. The latter work shows that, while much less frequent, the -form does occur indeed in living systems (Suzuki et al., 1965). Paoletti et al. report the experimental synthesis of an -complex between a -d(CCTTCC) hexanucleotide and its complementary -d(GGAAGG) (Paoletti et al., 1989). A comparison of the formation of this complex with its natural analog ( -d(CCTTCC) + -d(GGAAGG)) reveals that the formation of the non-natural form is only ≈1 kcal/mol more favored than its natural counterpart (Paoletti et al., 1989). There are other reports of synthesis of nucleic acids containing -nucleic acids strands (Froeyen et al., 2001;Guesnet et al., 1990;Lancelot et al., 1987Lancelot et al., , 1989. Kaur, Sharma, and Wetmore (KSW) have proposed barbituric acid and melamine (Kaur et al., 2017) and cyanuric acid (CA) and 2,4,6triaminopyrimidine (TAP) (Kaur et al., 2019) as precursors of prebiotic RNA on the basis of quantum mechanical calculations and molecular dynamics (MD) simulations. In their more recent paper (Kaur et al., 2019), KSW use density functional theory (DFT) calculations to obtain potential energy surfaces describing the rotation around the glycosidic bonds of -and -ribonucleosides of the non-canonical nucleobases TAP and CA as well as their complementary base pairing TAP:CA and stacking energies. Additionally, these authors studied the base pairing of these nucleobases with the canonical nucleobases (A, G, C, T and U) and compared the canonical 10-mer A-form of RNA duplexes 5 ′ -AAAAAAAAAA-3' paired with 5 ′ -UUUUUUUUUU-3 ′ and 5 ′ -AAAGCGCAAA-3 ′ paired with its complementary 5 ′ -UUUCGCGUUU-3 ′ with the non-canonical duplexes 5 ′ -AAAXXYYAAA-3 ′ paired with 3 ′ -UUUYYXXUUU-5', where X = CA and Y = TAP. The results obtained suggest that the strength for the hydrogen bonds created in the TAP:CA pairing is comparable to the canonical base pairing. The stacking of these non-canonical bases is, on the other hand, weaker compared to the canonical stacking, suggesting that the enhanced stacking may have served as a driving force in the evolution of nucleic acids. Finally, the assembled structure of TAP-CAcontaining helices suggests that this type of pre-RNA could have been shielded from water allowing its evolution and self-replication. The results in that paper, (Kaur et al., 2019), place TAP and CA as plausible candidates for a pre-or proto-RNA.
In their earlier study, KSW computational results on barbituric acid (BA) and melamine (MM) suggest their plausibility as non-canonical nucleic acid bases that may have been present in the precursor of present day nucleic acids (Kaur et al., 2017). The authors find that the strength of the hydrogen bonds between BA and MM makes them good candidates as building blocks of ancestral nucleic acids. On the other hand, they find that the stacking interactions were stronger when either BA or MM are combined with a canonical nucleobase than when the stacking was between each other. These results suggest the possibility of the existence of a pre-or proto-RNA that mixes canonical and non-canonical complementary nucleobases within one structure (Kaur et al., 2017).
In their first paper, KSW report that the potential energy hypersurface of breaking the glycosidic bond is consistent with a stronger bond in TAP nucleosides compared to canonical nucleosides. Interestingly, in the case of the CA the opposite result is obtained (Kaur et al., 2019). KSW found larger deglycosylation barriers for the C-C glycosidic bond of BA-ribonucleosides compared to canonical nucleobases while the reverse occurs in the case of MM (Kaur et al., 2017).
The biopolymers of life are believed to have emerged between 3.5 and 4 billion years ago (Kitadai and Maruyama, 2018;Sutherland and Whitfield, 1997), with details still to be worked-out. For instance, which was first: Proteins or nucleic acids? The consensus is that nucleic acids were probably the first biomolecules, specifically RNA in what is commonly known as the "RNA world hypothesis" (Cech, 2012;Gesteland et al., 1999), since RNA is both a catalyst and a repository of genetic information making it candidate for the first biopolymers (Ayukawa et al., 2019;Neveu et al., 2013;Orgel, 2004).
If we accept that RNA came first, then other questions arise. For example, how did this molecule originate in the first place? It has been proposed that nucleic acids were the product of prebiotic and geochemical reactions, namely, the "drying pool", "drying lagoon", also known as the "classic model", whereby regular cycles of dehydration-re-hydration can promote the polymer formation. Given the important role of the socalled "water problem" in early evolution, whereby H 2 O impedes the synthesis of nucleic acids (do Nascimento Vieira et al., 2020;Hud et al., 2013;Joyce and Orgel, 1999;Kim et al., 2016), in this work both vacuum phase and aqueous phase calculations were conducted. Within the classic model, Orgel and coworkers explored the formation of the glycosidic bond between purines (adenine, guanine, inosine, xanthine) and ribose sugar by drying and heating the reactants in the presence of catalysts (Fuller et al., 1972). Only adenine was found to couple with the ribose to produce -and -furanosil nucleosides with yields typically within ≈2-10%. The relatively low yield by Orgel and coworkers were attributed to the instability of the glycosidic bond in aqueous environment giving rise to what is known as the "nucleoside problem" (a special case of the more general "water problem") (do Nascimento Vieira et al., 2020;Hud et al., 2013;Joyce and Orgel, 1999;Kim et al., 2016). Challenges including the nucleoside problem have led scientists to look for alternative synthetic routes that start, for example, from phosphorylated sugars and free bases (Bernhardt and Sandwick, 2014;Crowe and Sutherland, 2006;Gull et al., 2017;Ingar et al., 2003;Kim and Kim, 2019;Zubay and Mui, 2001).
The hypothesis being tested here is that Nature's stereo-selection of the present-day canonical nucleosides/nucleotides is consistent with an energy-driven natural selection. Thus, the two anomeric forms of the nucleosides/nucleotides were studied as they occur within both DNA and RNA and compared for their thermodynamic stabilities.
In final account, since the deamination of cytosine transforms it into uracil with a consequential change in the genetic message, selection pressures may have driven the transition from uracil to thymine in DNA (Poole et al., 2001). This question has long been raised (Lesk, 1969) but appears to have never been resolved on the grounds of relative total molecular energies to the best of the authors' knowledge. Since there appears to be no a priori reason for Nature to have selected thymine to be incorporated exclusively in DNA and uracil in RNA (with exceptions), mixed ("wrong", or non-canonical) nucleotides have also been considered in our calculations as an available choice for natural selection.
To sum-up, we observe a preponderance of -over the -anomers in present-day nucleosides(tides). This work indicates that such preponderance is consistent with thermodynamic parameters calculated quantum chemically within the assumptions and approximations of the study. The known specificity of uracil to RNA and thymine to DNA is also consistent with these results. While both kinetics and thermodynamics, and the interactions with solvent molecules and ions, must have all driven together evolutionary selection of life's building blocks, in this work we restrict the question as to whether there exists an inherent preference in the building block themselves that is consistent with the observed "naturally selected" present day nucleic acids.

Computational details
The structure of each nucleoside (sugar + nitrogenous base) in two anomeric forms ( and ) where constructed with the graphic interfaces of Hyperchem 7.0 (1996) and GaussView 5.0 (Dennington et al., 2009). The initial 20 nucleosidic structures {[(2 sugars (ribose, and 2 ′ -deoxyribose)) × (5 bases (adenine (A), thymine (T), guanine (G), cytosine (C), and uracil (U)))] × 2 ( and )} were subjected individually to a soft potential energy hypersurface (PES) scan with respect to the angle that governs the N-glycosidic bond between the base and the sugar. "Soft scan" means that the only constrain is the angle being scanned stepwise while all other geometrical parameters are allowed to relax in response to that angle.
The above-mentioned PES scans were performed in the following way. -matrices for the ribofuranose and 2 ′ -deoxyribofuranose sugar, each in both the and configuration, were read into the programme Granadarot (Montero, 2019;Montero et al., 1998). The Granadarot algorithm was used to create, for the ribofuranose, 1,000 different conformers by randomly varying the five dihedral angles that involve all the sugar's hydroxyl groups (4 angles of the H-O-C-C type, in addition to the O-C5 ′ -C4 ′ -C3 ′ angle -also known as angle (not to be confused with the anomeric label)). For the 2 ′ -deoxyribofuranose, a similar procedure was used to also generate 1,000 conformers with the difference that now there are only 3 angles of the H-O-C-C type. The number of initial random conformers (1,000) strikes a balance between a good sampling of the PES and computational costs.
For each of these initial randomized sugar structures (4,000 in total), the geometries were optimized at the semiempirical PM7 level of quantum chemical theory, a method that includes empirical corrections for dispersion and hydrogen bonding interactions (Stewart, 2007(Stewart, , 2013. By including these corrections, PM7 has an important advantage over other semiempirical methods that generally do not account for dispersion and hydrogen bonding explicitly. These PM7 geometry optimizations were conducted through a gradient minimization of the total energy using the MOPAC2016 package (Stewart, 2019) until the forces on the nuclei were negligible. These geometry optimizations were performed twice: Once in vacuum phase and a second time with the COSMO continuum solvation model (Klamt and Schüümann, 1993).
The geometry of every sugar structure of the ′ that survived the initial screening, was re-optimized without constraints at the density functional level of theory (DFT) (Koch and Holthausen, 2001;Parr and Yang, 1989;Scholl and Steckel, 2009) using the B3LYP/6-31G( , ) functional (Becke, 1993;Lee et al., 1988) / basis set (Hehre et al., 1986) combination. Aqueous solvation was accounted for in the DFT calculations using an integral equation formalism variant of the "polarizable continuum model" (IEFPCM) (Miertus et al., 1981;Tomasi et al., 2010;Tomasi et al., 2005) implemented in Gaussian 16 (Frisch et al., 2019), the software package used in all DFT calculations in this work.
The so-called "water problem" (Joyce and Orgel, 1999) describes the consensus understanding that the primordial soup has been non-polar in nature or, at least, had a controlled exposure to water (see for example Ref. (do Nascimento Vieira et al., 2020) and literature cited therein). Hence, the primary results to be considered and discussed here are those in the vacuum phase as a surrogate for non-polar environment. Solvation modeling has been included, however, to also test the effects of this very "water problem" but also for completion since aqueous medium predominates in contemporary living systems.
Solvation remains an open problem for quantum chemical calculations. One has to pick from the dichotomy of explicit solvation or the modeling of its effects by placing the solvent in a cavity within a bulk dielectric continuum, that is, implicit solvation (Cramer, 2002;Cramer and Truhlar, 1995;Cramer and Truhlar, 1999;Foresman and Frisch, 1996;Marenich et al., 2009;Miertus et al., 1981;Tomasi et al., 2010;Tomasi et al., 2005;Tomasi and Persico, 1994). Explicit solvation is best to describe strong localized interactions between the solute molecule and immediate solvation shell molecules, while continuum solvation modeling is better tuned to capture the long-range averaged effects of solvation by the solvent bulk. Explicit solvation, ideally, entails a gradual addition of solvent molecules until the convergence of some relevant parameters, which is impractical here in view of the large number of studied systems. Hence, the second best option, that is, the continuum modeling, has been applied in this work.
Every geometry optimization in this work has been followed by a harmonic vibrational analysis and all were found to exhibit no imaginary frequency as required to confirm their status as local minima on the PES. For each one of the eight groups described above, the most stable structure of the sugar that resulted from the DFT optimization was saved for the next step and the rest of the structures were discarded.
The five nucleobases (A, G, C, T, U) were optimized at the DFT-B3LYP/6-31G( , ) level of theory, in a similar procedure as the one outlined above for the sugars, in vacuum and in solvent phase. These optimized structures were then stitched to the sugars leading to 40 separate initial N-nucleoside guess geometries (5 bases × 2 sugars × 2 configurations × 2 phases/environments).
For consistency, the N-glycosidic bond, C1 ′ -N1 in pyrimidines (Y) or C1 ′ -N9 in purines (R) was initially set to 1.52 Å while the dihedral angle H-C1 ′ -N1(Y)/N9(R)-C was set initially at −161.9 • . Each of the 40 nucleoside structures was then subjected to a fully relaxed scan around this dihedral in 6 steps each of 60 • . The lowest structure from this scan was refined by subjecting it to a final fully-unconstrained optimization to obtain the final structure of the nucleosides in vacuum and in solvent. A harmonic frequency calculation was performed as usual to ensure that the final structures are indeed minima on the PES.
Finally, a mono-anionic dihydrogen phosphate group (H 2 PO − 4 ), the form dominant at pH 6.5 (Sponer et al., 2011), has been optimized unconstrained both in the vacuum phase and in continuum solvent at the DFT-B3LYP/6-31G( , ) level of theory. The optimized phosphate was then attached to the nucleosides at C5 ′ -OH setting the initial O-C5 ′ -C4 ′ -C3 torsion angle to 30.9 • (the standard angle for the stored structures in GaussView). A soft scan was then performed in 6 steps of 60 degrees, and again the lowest energy conformer of the nucleotide was retained for further analysis. The procedure outlined above yields 80 final optimized structures: 40 for each class (nucleoside, and nucleotide), 20 in gas-and 20 in solution-phase.
The steps described above proceed in the following order: sugar + base → nucleoside followed by a geometry optimization of the nucleoside, and then by the reaction nucleoside + phosphate → nucleotide followed by a geometry optimization of the nucleotide. Since every step of these two concatenated "reactions" is followed by a geometry optimization, a change in the order of these reactions does not necessarily yield the same geometries (and energies). Hence, the procedure described so far has been repeated except by reversing the order of the reaction, that is: sugar For a given pair of / -anomers, the difference in their energies (relative energy) is defined as (equation (1)): where Δ denotes Δ (the difference is of the total energies), or Δ (ZPE) (the difference in the total energies corrected for zero-point vibrational energies (ZPEs)), or Δ • (the difference in the Gibbs energies at standard conditions). The inclusion of solvation effects will be indicated when necessary using extra symbols. The DFT-B3LYP/6-31G( , ) level of theory has been chosen for this study as a reasonable compromise of accuracy and speed/feasibility. The error bars for a similar level of theory, namely, DFT-B3LYP/6-31+G( , ) have been benchmarked by Zhao and Truhlar's to be around 3.6 kcal/mol (Zhao and Truhlar, 2008). Zhao and Truhlar obtained this estimate by comparing the calculated and experimental thermodynamic data for 177 main-group compounds (Zhao and Truhlar, 2008). On that basis, we may take the intrinsic uncertainty of the method used in this work to be around ≈3 -4 kcal/mol.

Two hypothetical synthetic pathways and their Gibbs energies
The two pathways of the formation of a nucleotide are depicted in Fig. 2 and are not equivalent. That non-equivalency is due to the effect of the first condensation on geometry which leads to final products trapped at different wells on the potential energy surface of the nucleotide. In other words, (a + b) and (c + d) are different pathways with different Δ • reaction . Recapping, the two condensation sequences considered are: (1) The condensation of a sugar with a base to obtain the N-nucleoside followed by the condensation of the nucleoside with a dihydrogenphosphate anion (H 2 PO − 4 ) at the 5 ′ position to obtain the nucleotide. The Gibbs free energy of this reaction is defined as (equation (2)): ] . (2) (2) The condensation reaction of a H 2 PO − 4 group with a sugar at C5 ′ first, followed by a condensation of the sugar 5 ′ -monophosphate with the base. In this case the Gibbs energy of reaction is defined as (equation (3)): (3)

Fig. 2.
Two different pathways for constructing the -and -anomers of nucleosides and nucleotides. (Reaction pathways referred-to in the text and tables are labeled with lower-case letters: Classical pathway (a + b) and alternative pathway (c + d); R=H,OH for DNA and RNA, respectively).
As a prelude to this study, we first revisit the relative stabilities of the furanose forms of the sugars themselves. A 13 C NMR study complemented with a statistical mechanics analysis by Dass et al. demonstrates a strong temperature-(gradient)-dependence of the equilibrium ratios of the various open-chain and cyclic forms of ribose sugar (Dass et al., 2021). These authors are simulating the conditions of temperature and temperature-gradient near hydrothermal vents to answer the question of which form(s) of the ribose sugars were favored at early prebiotic times. From these authors' Fig. 2 (Dass et al., 2021), in pure aqueous solution at 25 • C, the -pyranose ( P-form) is dominant with a mole fraction of ≈0.6, followed by the P-form with a mole fraction of ≈0.2. These authors' figure indicates much lower populations for the two furanose forms under these conditions, both anomers having similar mole fractions of ≈0.1 each. This means that at 25 • C the pyranose form dominates largely and especially in its -form. These figures do not change significantly when the medium simulates Hadean waters (Dass et al., 2021).
Interestingly, Fig. 2 of (Dass et al., 2021) features break-even points of the eight curves. Beyond a temperature of ≈130 • C, the population is inverted with an increasing dominance of the furanose form, starting with a small excess of ≈1.5 × favoring the form at the beginning, with a gap between the and the populations that widens as the tempera-ture increases (whether in pure or Hadean water) reaching a / ratio of ≈2 at ≈130 • C (Dass et al., 2021). It is inferred, in conclusion, that the early hot atmosphere may have been the driver for the selection of the -furanose form that remained to this day after the temperatures have cooled down.
This proclivity to select the -furanose form can be enhanced at lower temperature in the presence of large temperature gradients as it occurs near hydrothermal vents for example. At room temperature, however, the data of Dass et al. show a slight but not substantial propensity for the -over -furanose, whether in pure aqueous solution or one that simulate Hadean medium (Dass et al., 2021). Azofra et al.'s (Azofra et al., 2014) DFT exploration of the potential energy landscape generated thousands of rotamers of (deoxy)ribopyranose, (deoxy)ribofuranose in their open chain and -andanomeric forms. In their study, these authors report results based on DFT vacuum-phase calculations with both the B3LYP/6-311+ +G( , ) and M06-2X/6-311 + +G( , ) chemical models (Azofra et al., 2014). The authors also find a dominance of the pyranose form over the furanose from 0 K to room temperature (298 K), in agreement with the more recent experimental results of Dass et al. (Dass et al., 2021). Meanwhile, within the small proportion of furanose, in the case of ribofuranose, the B3LYP/M06-2X functional predicts a higher Boltzmann populations of -ribofuranose (3.4/0.2% (298 K) and 1.5/0.1% (0K)) than theribofuranose (0.6/0.0% (298 K) and 0.2/0.0 (0K)). A similar trend is also found for the -2 ′ -deoxyribofuranose forms, in which case the re- Table 1. Differences in energies between the most stable -and -anomers for the sugars 2 ′ -deoxy (d) or (r)ibose in vacuum and in aqueous environment in kcal/mol (equation (1)). Included differences are between: The total energies without (Δ ) and with zero-point vibrational correction (ZPE) (Δ (ZPE) ), and Gibbs energies Δ 0 at STP conditions. The listed results are from DFT (B3LYP/6-31G( , )) calculations. The integral equation formalism of the polarizable continuum model (IEFPCM) of solvation has been used to generate the results incorporating aqueous solvation at the same level of DFT theory.   (Azofra et al., 2014) study is that the pyranose form is more populated than the furanose form at room (and lower) temperatures, but the results using both DFT functionals indicate a slight advantage, within the furanose population, for the -anomer. However, the differences in energies and their corresponding Boltzmann's populations are probably within the error bars of the DFT calculations. Hence, we may conclude that these studies do not indicate a clear advantage of one furanose anomer over the other at room temperature. Cocinero et al. performed a combined experimental/computational study (rotational FT-MW spectroscopy and three levels of theory: MP2/6-311+ +G( , ), B3LYP//6-311 + +G( , ), and M06-2X//6-311 + +G( , )) that again shows the almost exclusive dominance of the pyranose form in the gas-phase at room temperature (Cocinero et al., 2011). However, these authors also demonstrate that aqueous solvation increases the propensity for the furanose form at room temperature compared to the gas-phase (Cocinero et al., 2011).
Our results listed in Table 1 are consistent with the findings of Azofra et al.'s (Azofra et al., 2014) suggesting a borderline advantage of the -over the -furanose anomer at room temperature, whether in solvent or in the vacuum phase, and for either ribose or deoxyribose (see first three entries of Table 1, especially the third, for differences in Gibbs energy which are less than . 3 kcal/mol). In fact these results are consistent with all those mentioned above by other workers since they indicate an inconclusive advantage of one form or another. Table 1 also lists the effect of adding the phosphate group at position 5' of the sugar. The phosphate group has a significant effect whereby the slight advantage of the -over the -forms of the free sugars is now reversed. As can be seen from the entries in the Table, the sugar monophosphates are slightly more stable in the -forms, whether in gas-or solutionphase and whether ribose or deoxyribose (differences in Gibbs energy are all above a kcal/mol, approximately 2 -3 kcal/mol for RNA and 4 -5 kcal/mol in DNA, in vacuum, and with smaller magnitudes (but still negative values) in solution) ( Table 1)

Which nucleoside anomer is more stable: or ?
Ball-and-stick representations of the optimized geometries of all studied structures and their Gibbs energies of inter-conversion can be found in the SI (Figs. S.5 -S.14). Table 2. Differences between the energies of the most stableand -anomers of the 2 ′ deoxy (d) or ribonucleosides in vacuum and in aqueous environment (equation 1). Included differences are between: The total energies without (Δ ) and with zero-point vibrational correction (ZPE) (Δ (ZPE) ), and Gibbs energies Δ 0 . All results are obtained from DFT (B3LYP/6-31G( , )) calculations. The integral equation formalism of the polarizable continuum model (IEFPCM) of solvation has been used to generate the results incorporating aqueous solvation at the same level of DFT theory.  The 20 differences in energies between the -and -anomers for each of the nucleosides obtained from equation (1) are listed in Table 2. For most cases, the difference in stabilities of the anomers falls within the probable error bars of the theoretical method (DFT-B3LYP/6-31G( , )), that is, approximately 3 -4 kcal/mol. In the case of DNA, all differences in Gibbs energies are <2 kcal/mol, whether in vacuum or in solution and for all five nucleosides. One notices that, in all cases, solvation magnifies the relative stability of the -anomer by ≈2 kcal/mol. As for RNA in vacuum, Gibbs energies suggest a slight relative stability of the -anomer of G over the -form (by ≈2 kcal/mol), while the reverse is true for the rest, with the -form being more stable by ≈2 kcal/mol for A, T, and U, and by ≈4 kcal/mol for C.
In solution phase, and judging from the relative Δ values, the -forms of the purines are more stable than their counterparts by ≈2 kcal/mol while for pyrimidines the differences between the two forms are negligible (below chemical accuracy of 1 kcal/mol), Table 2. These energy differences between the anomers are within an order of magnitude of the thermal energy at room temperature B (298 K) ≈0.5 kcal/mol, and that at 70 • C ( B (343 K) ≈0.7 kcal/mol), a temperature beleived to have prevaled during the Archean eon when the first forms of primordial life may have emerged (Garcia et al., 2017).
The calculated energy differences listed in Table 2, whether Gibbs or total energies, fall within the probable error bars of the method and hence, while indicative, cannot be considered definitive. The consistency of the trend in Table 2 may allow us to conclude that there appears to be a small thermodynamic advantage for the -anomer over the -anomer in solution in both nucleic acids. In vacuum phase, in the case of DNA, the Gibbs energy differences are below chemical accuracy and hence the two forms are iso-energetic. Meanwhile, for RNA in vacuum generally the -anomer is more stable (slightly for A, T, and U, and more notably for C) except for G for which the -anomer has a slight advantage. Table 3. Differences between the energies of the most stable -and -anomers of the 2 ′ deoxy (d) or (rib)onucleotides in vacuum and in aqueous environment (equation 1) as given by the reaction pathway sequence (a) and (b) of Fig. 2. Included differences are between: The total energies without (Δ ) and with zero-point vibrational correction (ZPE) (Δ (ZPE) ), and Gibbs energies Δ 0 . All results are obtained from DFT (B3LYP/6-31G( , )) calculations. The integral equation formalism of the polarizable continuum model (IEFPCM) of solvation has been used to generate the results incorporating aqueous solvation at the same level of DFT theory. (1) NMP = unspecified nucleoside 5 ′ -monophosphate (nucleotide).

Which nucleotide anomer is more stable and in what conditions?
Tables 3 and 4 give the relative (Gibbs) energies of the and anomers obtained via the "classical" pathway ((a + b) - Table 3 (1). Since all individual energies are negative, entries in these tables that are negative indicate that the anomer is more stable and vice versa.
Assuming RNA preceded DNA chronologically, a glance at Table 3 (nucleotides formed via pathway (a + b)) suggests that -anomers are favored across the board in vacuum/non-polar medium. In this case, the relative Gibbs energies, ordered in decreasing magnitudes, are Δ 0 (UMP) ≈ −13 kcal/mol, Δ 0 (AMP) ≈ −10 kcal/mol, Δ 0 (CMP) ≈ −9 kcal/mol, Δ 0 (GMP) ≈ −7 kcal/mol, Δ 0 (TMP) ≈ −3 kcal/mol. Interestingly, the least remarkable difference is that of the rarely seen nucleotide, that is, TMP (favoring the -form by only 3 kcal/mol) as opposed to the one that actually occur in today's RNA, that is, UMP which exhibits, in fact, the highest differential stability favoring the -anomer (by 13 kcal/mol). Coincidence? There is no way to tell for certain, but suggestive it is.
Continuum solvation reduces the clear advantage of the -anomer over their counterparts to the level of computational noise, yet with consistency (except for a small reversal ≈2 kcal/mol for AMP) ( Table 3). As emphasized above, the results including solvation can be taken as a qualitative indicator only given the lack of localized solventsolute interaction(s) that may stabilize or destabilize the system.
As for DNA, the vacuum calculations suggest a considerable advantage for the canonical forms of purines and the reverse for pyrimidines. The advantage of the -form is particularly marked in the case of dGMP by ≈17 kcal/mol of more stability (lower energy) compared to theform. In aqueous solution, the results are inconclusive being, probably, within computational uncertainties (as discussed above). (1) NMP = unspecified nucleoside 5 ′ -monophosphate (nucleotide).
Moving to the path (c + d) (Fig. 2), Table 4 suggests that, in vacuum (or in non-polar medium), the two forms of the RNA nucleotides have indistinguishable stabilities within the probable error bars of the method except for TMP and UMP. In these latter two cases, the form is considerably more stable with ≈8 and 11 kcal/mol, respectively. These observations are inconsistent with today's state of affairs on three grounds: ( ) AMP and GMP are predicted to have energies marginally favoring their -forms, and, more importantly ( ) TMP and UMP are much more stable in their -form especially UMP. Since these contradict what is being observed -that pathway is less likely to have been Nature's choice leaving the other pathway (a + b) as a more probable scenario.
The addition of a phosphate group on the sugar first then the base last, in the (c + d) pathway, creates ample opportunity for the highly anionic oxygens of the phosphate to hydrogen bond with the sugar's hydroxyl groups (see Figs. S.25 -S.34). In the classical pathway, (a + b), when the base is added first on the sugar, the acidic hydrogens of the base in the form -being in closer proximity to the phosphate group (both are on the same face of the sugar mean plane) -can form hydrogen bonds with the latter (see Figs. S.15 -S.24). This hydrogen bonding locks the phosphate group on that side of the mean plane of the sugar and competes with its capacity to form more hydrogen bonds with the sugar hydroxyl groups.

The order of addition of the three components of nucleotides matters
The hypothesis being tested here is captured by the following question. Suppose a series of net reactions were available in prebiotic times that lead to the formation of the first nucleotides, whether those of DNA or of RNA. Is there a particular order of addition that is more energetically favorable? In other words, which one of the following net reactions, Table 5. Gibbs (Δ 0 ) energies at standard pressure and temperature in kcal/mol for a hypothetical condensation leading to the 5 canonical ribonucleosides 5 ′ -monophosphate (NMPs) (nucleotides) and their counterparts in vacuum and in aqueous environment. The Gibbs energies of the two reaction pathways are defined by Eqs. (2) and (3). "Reaction" pathways are labeled according to Fig. 2. (From DFT calculations at the B3LYP/6-31G( , ) level of theory, with aqueous solvation modeled with the IEFPCM model). (1) NMP = unspecified nucleoside 5 ′ -monophosphate (nucleotide). (2) The ΔΔ values are the Gibbs energies of reaction along a given two-step pathway for the anomer minus the same pathway but for the anomer. For example, from the pathways labeled in Fig. 2, ΔΔ 0 Rx( + ) is the difference of . In turn, , for instance, is the sum of the Gibbs energies for the condensation reaction along the pathway (a) then (b) yielding the nucleotide. Expressed symbolically, Hence, in general: ΔΔ 0 + = , where = a, c, and = b, d.
fleshed-out in Fig. 2, is energetically more favorable, i.e. leads to a more negative Δ : • Reaction ( + ): sugar + base where Ns = nucleoside and Nt = nucleotide in either the -or theanomeric form, the sugar could be either ribose or deoxyribose, and 5 ′ -SMP = 5 ′ -sugar monophosphate. The subscripts in bracket indicate the addition sequence. There are a total of 2 (pathways) × 2 (sugars) × 5 (bases) × 2 (anomers) × 2 (solvation conditions) = 80 "reactions" in total the Gibbs energies of which are summarized in Table 5 (see also Figs. 3 and 4). The Gibbs energies of the two reaction pathways are defined by Eqs.
(2) and (3). It is important to remind the reader that since the optimized geometry, and hence the energy, of the final product depends on the sequence, we have different "products" (local minima) (Nt (a+b) ≠ Nt (c+d) ). A glance at Table 5 suggests that the pathway (a + b) for the -anomer is the most favored (more exergonic) in vacuum yet both pathways are endergonic in the continuum solvation model used. Hence the following discussion will focus on pathway (a + b) with the vacuumphase results examined first. As can be seen from Table 5 and Fig. 3, the condensation reaction between a base and a sugar (reaction (a)) is not favored thermodynamically either in vacuum or aqueous solution. However, the next coupled step in this pathway, step (b), is sufficiently exergonic to drive the entire reaction to competition with negative free energy falling in magnitude within the range 11 kcal/mol < |Δ | < 24  kcal/mol in the case of the -anomers, and to a lesser extent in the case of the -anomers in which case the magnitudes of the energies of reaction are 6 kcal/mol < |Δ | < 20 kcal/mol.
It is further noticed from Table 5 that step (b) reactions are generally more exergonic for the -anomers of RNA than for their DNA counterparts. This second step, (b), is also exergonic for the -anomers. This step, for most -nucleotides, is less exergonic (and significantly so) than the corresponding -nucleotides except for the deoxyribonucleotides of the pyrimidine bases.
The overall reaction energies strongly favor the (a + b) pathways for all ribonucleotides in vacuum and for both anomers. The overall -pathways are typically doubly or triply more exergonic than thepathways (as can be seen from Table 5 and Fig. 3) except in the case of the pyrimidines-deoxy-ribonucleotides. The differential Gibbs energies between the -and -pathways, ΔΔ (a + b) , that captures the effect of ∕ isomerization on the overall reaction Gibbs energies are more exergonic for the -reactions except for the pyrimidines in DNA.
In aqueous medium, an examination at the overall energies of the (a + b) pathways shows that solvation flips all the vacuum-phase spontaneous reactions to non-spontaneous ones (Table 5 and Fig. 3). This is in line with the "water problem" (Joyce and Orgel, 1999) which seems to support a non-polar primordial soup. Interestingly, the reactions of the -anomers are all more endergonic than those with the -reactions, again suggesting that -in this case -the -reaction is "less forbidden", so to speak, than the -counterpart.
In vacuum, for the alternative (c + d) pathway, the first step, namely (c), the condensation of the phosphate and the sugar is exergonic across the board, especially for the -form (Table 5, and Fig. 4). Meanwhile Table 6. Differences between the energies of the canonical (predominant) nucleosides and their minor counterparts (Fig. 5) in vacuum and in aqueous environment (energies of the canonical form minus that of the minor form). Included differences are between: The total energies without (Δ ) and with zero-point vibrational correction (ZPE) (Δ (ZPE) ), and Gibbs energies Δ • . All energies are in kcal/mol and are obtained from DFT (B3LYP/6-31G( , )) calculations. The sugar exchange (or swapping) is written as "chemical reactions" in Eqs. (4) and (5). The integral equation formalism of the polarizable continuum model (IEFPCM) of solvation has been used to generate the results incorporating aqueous solvation at the same level of DFT theory.
when the sugar is substituted at its 5 ′ position by the phosphate group, the energies of step (d) do not suggest particularly strong trends. The overall reaction energy, though, still indicates that all are spontaneous but to lesser extents than the classical pathway (as mentioned above). Continuum solvation generally flattens the magnitudes of all reactions in the alternative pathways. In this case, step (c) is converted to a nonspontaneous reaction for the -anomers and marginally exergonic for all the -anomers. The next step (d) in water is unfavorable in all cases leading to an overall endergonic (c + d) pathway in all cases (as in the classical pathway), with -on average -a marginally less favorable reaction in aqueous medium for both anomers. It is concluded that the classical pathway for the -anomers (the anomer which prevails in today's nucleic acids) emerges, again, as the generally favored thermodynamic choice.

Sugar exchange reactions between U and T nucleosides and nucleotides
Lesk posed the question of "Why does DNA contain Thymine and RNA Uracil?" in the title an important paper that appeared as early as 1969 (Lesk, 1969). Lesk suggested several reasons for the choice including the suggestion of a slight thermodynamic advantage of the dominant forms over the minor forms. This point completes the present study which is addressed in a similar fashion as the manner as above.
From two bases (U, T), two sugars (ribose and 2 ′ -deoxyribose), and two configurations ( , ) there are 8 possibilities (see Fig. 5). The top panel of this figure represents the (canonical) nucleosides that predominate in the genetic material of all contemporary living organisms, that is to say, thymine on deoxyribose (dT) and uracil on ribose (U). The bottom panel of Fig. 5 presents those that occur infrequently in the genetic material, i.e., T and dU. Table 6 compares the calculated energies of the thiamine and uracil nucleosides in hypothetical sugar exchange reactions, defined by "chemical reactions" (reaction 4) and (reaction 5) below, (where, for clarity, "r" is added to denote ribose sugar): and -rT + -dU ⟶ -dT + -rU, in both the vacuum phase and solvent phase. These reactions switch the pair of bases from their sugars in their canonical nucleosides to the sugars in their non-canonical ones delivering the energies of sugar double exchange "reactions". The energies listed in Table 6 indicate a consistent lack of any significant energy difference upon affecting this transformation in all types of calculations The minor forms, i.e., thymidine (T) and 2 ′ -deoxyuridine (dU) which are not normally incorporated in RNA and DNA, respectively. The sugar exchange (or swapping) is written as "chemical reactions" in Eqs. (4) and (5). Also see text and Table 6. (The star ( * ) denotes the anomeric center (C1 ′ ) of the sugar.) and energies in the cases of the nucleosides (the first two rows in Table 6). Thus, for the nucleosides, all differences in the Table (whether of uncorrected total energy differences, ZPE-corrected energy differences, or Gibbs energy differences are below chemical accuracy) suggest essentially equal stability of the "correct" and "wrong" nucleosides. Thus, there is no clear thermodynamic advantage of attaching one base on one particular sugar, whether in the -or in the -configurations, in the vacuum or solution phase.
Next, the effect of attaching a phosphate group at position 5 ′ on the relative stabilities of the canonical and non-canonical nucleotides is explored. As mentioned above, we have two distinct step-wise additions of the three components of the nucleotides as illustrated in Fig. 2. The energies of the "sugar exchange reactions" for both pathways can be found in Table 6. From the listed entries in this Table, the most remarkable one is that of the exchange of the sugar in the -forms in vacuum (or non-polar medium), clearly favoring the canonical form by up to 10 kcal/mol along the (a + b) pathway (adding the base first then the phosphate). This again reinforces our suggestion that the classical pathway is favored and the thermodynamic advantage of the -anomeric form existing in contemporary nucleic acids. This advantage of the 10 kcal/mol of the "correct" over the "wrong" pairing of base with sugar is reduced to noise below chemical accuracy in aqueous solution.
The alternative pathway (c + d) (phosphate first then base) leads to a reversal of the energetic advantage of the -forms in vacuum in favor of minor form by ≈4 kcal/mol in terms of Δ . In aqueous solution, the (c + d) pathway slightly favors the canonical forms by Δ ≈ 2 kcal/mol. See Table 6.
From these considerations it may be inferred that only when the phosphate is among the "reactants" there exists an advantage of the canonical forms: ( ) by ≈14 kcal/mol given the order of addition is base first then phosphate in a non-polar medium, or ( ) by ≈2 kcal/mol in aqueous environment for the reverse order of addition, that is, phosphate first.

Conclusions
The calculations suggest a slight thermodynamic advantage that favors the selection of the -over the -anomers. This is aligned with the concept of an evolutionary "energetic" selection of the fittest. Calculations accounting for implicit solvation in aqueous medium renders either pathways (a + b) or (c + d) thermodynamically unfavorable (Figs. 3 and 4). This last observation is consistent with the well-known "water problem".
The present work suggests an order of combination of the three nucleotide components: The condensation of the base with the sugar is first followed by the condensation of the phosphate at the 5 ′ , second. That is, the "classical pathway" emerges as the natural choice for the sequential addition of these components of nucleotides. As mentioned already, the addition of the two first reactants changes the geometries sufficiently to result in different geometries (and energies) when the third reactant is then added as the last condensation.
The final question addressed in this work is whether Nature's choice of incorporating U in RNA and T in DNA is consistent with a thermodynamic explanation. The results suggest an affirmative answer. Indeed, a comparison of the canonical . the non-canonical nucleotides of these two bases, ( ) reinforces the conclusion that the classical pathway is favored, and ( ) indicates an advantage of the canonical pairs compared to the no-canonical pairs when gauged by the "sugar exchange reactions". Furthermore, this thermodynamic advantage exists only for the -anomers (10 kcal/mol) and vanishes in the case of the -anomers. Thermodynamics have been invoked as a driver behind the syntax of the genetic code as seen today (Grosjean and Westhof, 2016;Klump et al., 2020). Three decades ago, an editorial in Nature had the intriguing title "[ ]s Darwinism a thermodynamic necessity?" (Maddox, 1991). That editorial highlights a paper by Torres in which Darwinian "fitness" has been formulated in thermodynamic terms (Torres, 1991). Torres addresses the logical fallacy of circulus in probando (circle in proving, commonly known as circular reasoning) of the concept of survival of the fittest (Torres, 1991). The fallacy is centered on that fittest is, by definition, the ability to be a survivor (Torres, 1991). These earlier works provide the context of the present one underscoring thermodynamics' role in driving the natural selection of today's canonical nucleotides.
This research addresses the question of why Nature selected the building blocks of nucleic acids as we know them today? The answer is sought in thermodynamic terms, with the underlying hypothesis that free energies are a factor that may have driven particular evolutionary choices. Other factors may have contributed at a particular selection of a molecular form. Such factors may include, for example, kinetics, catalysis including self-catalysis, interaction with light, etc. Our purpose here is much more modest and restricted as the questions addressed suggest.
Other factors that were not considered in this work is the effect of the inclusion of ions as well as of explicit solvation on the energy ordering. For now, the question this paper is addressing is, again, more modest and, that is, in the absence of these additional and relevant factors, what is the energy ordering of the nucleotides in their isolated forms with and without continuum solvation?

Author contribution statement
Lázaro A. M. Castanedo, Chérif F. Matta: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper.

Funding statement
Professor Chérif Farid Matta was supported by Funding Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada (NSERC DG-2015), and by CFI (CFI-Leaders Opportunity Fund 2020).

Declaration of interests statement
The authors declare no conflict of interest.

Additional information
Supplementary content related to this article has been published online at https://doi .org /10 .1016 /j .heliyon .2022 .e09657.